AD-769  169 


A  NEW  TECHNIQUE  IN  TH 
OF  EXPONENTIAL  QUEUE 


E  OPTIMIZATION 
ING  SYSTEMS 


Steven  A . 
California 


Lippman 

University 


Prepared  for: 

Air  Force  Office  of  Scientific  Research 
October  1973 


DISTRIBUTED  BY: 

KHi 

National  Technical  Information  Service 
U.  S.  DEPARTMENT  OF  COMMERCE 

5285  Port  Royal  Road,  Springfield  Vs.  22151 


V, 


Security  CI««»tfle»Mon 


W®-  769  /6? 


DOCUMENT  CONTROL  DATA  RAD 

(Security  clauilica lion  of  title,  body  of  mbitrmct  and  indtulng  annotation  must  be  entered  when  the  overall  report  Is  clmtnUied) 


1  ORIGINATING  ACTIVITY  (Corporate  author) 


Western  Management  Science  institute 
University  or  California,  Los  Angeles 


|2«.  REPORT  SECURITY  CLASSIFICATION 

Unclassified 


2b.  GROUP 


3  REPORT  TITLE 


A  New  Technique  in  the  Optimization  of  Exponential  Queueing  Systems 


4  DESCRIPTIVE  NOTES  (Type  ot  report  and  Inclusive  data  a) 


S  authORiS)  (Flral  name,  middle  initial,  laat  name) 

Steven  A.  Lippman 


0  REPORT  OATE 


October  1973 


7a.  TOTAL  NO.  OP  PAGES 


54 


7b.  NO  OF  REFS 


2A. 


BA.  CONTRACT  OR  GRANT  NO 

At 0SR-72-2349a 

b.  PR OJ  EC  T  NO 


9A.  ORIGINATOR'S  REPORT  NijMBER(S) 


Working  Paper  No.  211 

9b.  OTHER  REPORT  NO(S)  (Any  other  numbers  that  may  be  ns  signed 
thla  report) 


iS.  ABSTRACT 


We  consider  the  problem  of  controlling  M/M/c  queueing  systems  with 
c  >  1  .  By  providing  a  new  definition  of  the  time  of  transition,  we 
enlarge  the  standard  set  of  decision  epochs,  and  obtain  a  preferred 
version  of  the  n-period  problem  in  which  the  times  between  transitions 
are  exponential  random  variables  with  constant  parameter.  Using  this 
new  technique,  we  are  able  to  utilize  the  inductive  approach  in  a 
manner  characteristic  of  inventory  theory.  The  efficacy  of  the 
approach  is  then  demonstrated  by  successfully  finding  the  form  of  an 
optimal  policy  for  four  quite  distinct  models  that  have  appeared  in 
the  literature?  namely,  those  of  (i)  McGill,  (ii)  Miller-Cramer ,  (iii) 
Crabill-Sabeti,  and  (iv)  Low.  Of  particular  note,  our  analysis  estab¬ 
lishes  thatan  (s,S)  or  control-limit  policy  is,  as  previously  con¬ 
jectured,  optimal  for  an  M/M/c  queue  with  switching  costs  and  removable 
servers.  In  addition,  it  is  shown  for  the  Miller-Cramer  model  that  a 
policy  optimal  for  all  sufficiently  small  discount  factors  can  be  ob-  j 
tained  from  the  usual  average  cost  functional  equation  without  recourse 
to  further  computation.  ’ 
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ABSTRACT 


We  consider  the  problem  of  controlling  M/M/c  queueing  systems  with 
c  >  li  By  providing  a  new  definition  of  the  time  of  transition,  we 
enlarge  the  standard  set  of  decision  epochs,  and  obtain  a  preferred 
version  of  the  n-period  problem  in  which  the  times  between  transitions 
are  exponential  randan  variables  with  constant  parameter.  Using  this 
new  technique,  we  are  able  to  utilize  the  inductive  approach  in 
a  manner  characteristic  of  inventory  theory.  The  efficacy  of  the 
approach  is  then  demonstrated  by  successfully  finding  the  form  of  an 
optimal  policy  for  four  quite  distinct  models  that  have  appeared  in 
the  literature;  namely,  those  of  (i)  McGill,  (ii)  Miller-Cramer,  (iii) 
Crabill-Sabeti,  and  (iv)  Low.  Of  particular  note,  our  analysis  estab¬ 
lishes  that  an  (s,S)  or  control-limit  policy  is,  as  previously  con¬ 
jectured,  optimal  for  an  M/M/c  queue  with  switching  coBts  and  removable 
servers.  In  addition,  it  is  shown  for  the  Miller-Cramer  model  that  a 

policy  optimal  for  all  sufficiently  small  discount  factors  cam  be  obtained 
average  cost 

from  the  usual^ functional  equation  without  recourse  to  further  computation. 
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I.  INTRODUCTION 


Recently,  Prabhu  and  Stidham  [19]  presented  an  excellent  synthesis 
and  survey  of  the  literature  on  the  optimal  control  of  queueing  systems. 
There,  the  authors  clearly  articulated  the  need  for  effecting  a  unified 
treatment,  If  not  a  unified  theory,  of  the  optimal  control  of  queueing 
systems  In  contrast  to  the  ad  hoc  manner  that  has  characterized  develop¬ 
ment  within  the  field  to  date. 

With  a  view  towards  the  goal  of  providing  a  unified  treatment,  a  new 
definition  of  a  transition  for  exponential  or  Markovian  queueing  systems 
(that  Is,  systems  with  Poisson  arrivals  and  exponential  service  times) 

Is  Introduced  In  order  to  facilitate  the  use  of  the  Inductive  approach 
on  the  finite  horizon  problems  In  attempting  to  specify  the  form  of  an 
optimal  policy.  Quite  simply,  this  new  definition  merely  stipulates  that 
the  exponential  holding  times  between  transitions  (which  normally  entail 
a  change  In  state)  have  constant  parameter.  Thus,  the  times  between  trans 
itlons  are  Independent  of  not  only  the  control  policy  employed  but  also 
the  state  of  the  system.  As  will  be  demonstrated,  this  extraordinarily 
simple  device  yields  both  aesthetic  and  pragmatic  benefits. 

As  to  the  aesthetic  benefits,  we  claim  that  the  version  of  the  n- 
perlod  problem  as  defined  herein  Is  Intrinsically  more  meaningful  than 
the  n-perlod  problem  Induced  by  the  standard  definition  of  a  transition 
for  It  corresponds  to  a  finite  horizon  problem  of  some  given  expected 
duration. 


Pragmatically,  the  new  definition  enables  us  to  readily  obtain  many 
new  results  while  simultaneously  extracting  as  a  byproduct  a  number  of 
those  that  have,  with  considerable  Ingenuity  and  difficulty,  been  previously 
established.  This  Is  done  by  applying  the  Inductive  approach  to  show  that 
various  functions  of  the  n-perlod  return  function  are  convex  and/or  monotone. 

In  particular,  four  distinct  queueing  models  are  considered.  First, 

and  foremost,  Is  McGill's  [16]  WW/c  system  with  removable  servers.  Here 

we  established  --  for  the  first  time  --  the  optimality  of  an  (s,S)  or 

control-limit  policy  for  the  finite  and  infinite  horizon  problems,  both 

with  and  without  discounting.  It  is  shown  that  if  n  <_  +°°  periods 

remain,  there  are  1  customers  in  the  system,  and  the  number  of  servers 

"on"  is  not  between  the  boundaries  s„  t  and  Sn  < ,  then  turn  on  or 

n  ,1  n  ,i 

off  just  enough  servers  to  reach  the  boundary;  otherwise,  do  not  change 
the  number  of  servers  on. 

The  various  researchers  who  have  investigated  the  other  three  models 
we  consider  all  adopted  maximization  of  the  long-run  average  expected 
reward  per  unit  time  as  their  criterion  of  optimality.  As  in  the  example 
of  the  M/M/c  queue  with  removable  servers,  we  establish  the  form  of  an 
optimal  policy  for  the  finite  and  Infinite  horizon  problems,  both  with 
and  without  discounting.  Moreover,  In  all  three  of  the  models  we  establish 
the  existence  of  very  strong  planning  horizons.  These  three  models  re¬ 
ferred  to  include:  (1)  Cramer's  [7]  extension  of  Miller's  [18]  M/M/c  system 
with  finite  queue  capacity  In  which  the  customers  are  distinguished  by 
the  reward  associated  with  their  acceptance  into  the  queue;  (ii)  Crabill 
[5,6]  and  Sabeti's  [20]  M/M/1  system  in  which  the  server  can  operate  at 
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any  one  of  a  finite  number  of  service  rates;  there  Is  a  higher  cost  associated 
with  faster  service  rates  and  a  linear  holding  cost  in  addition  to  a  reward 
for  service  completions,  (ill)  Low's  [13,14]  M/M/c  finite  capacity  queue 
In  which  the  decision  maker  (in  effect)  dynamically  selects  the  customer 
arrival  rate  so  as  to  balance  the  linear  holding  costs  against  the  higher 
entrance  fees  that  accompany  slower  arrival  rates.  This  approach  will  not 
prove  beneficial  in  all  instances;  in  particular,  the  approach  has  not 
proved  useful  in  analyzing  Cramer's  models  m3  and  [7  ,  pp.  38-57]. 

In  summary,  the  purpose  of  this  paper  is  twofold.  First,  a  redefini¬ 
tion  of  transitions  is  proposed  for  exponential  queueing  systems  in  order 
to  bring  into  being  both  a  more  meaningful  and  a  technically  more  useful 
version  of  the  n-period  problem  and,  simultaneously,  in  order  to  achieve 
some  small  unification  in  the  treatment  of  exponential  queueing  systems. 

The  second  purpose  is  to  establish  new  results  for  four  specific  models 
which  occupy  a  rather  prominent  position  in  the  f.eld  vis-a-vis  other 
models  to  be  found  in  the  literature. 

The  four  models  are  considered  in  sections  3  through  6  while  nota¬ 
tion  and  a  more  detailed  explanation  of  our  new  definition  of  a  transition 
are  presented  in  section  2. 
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IX.  THE  NEW  SET  OF  DECISION  EPOCHS 


In  pursuing  our  investigation  of  the  optimal  control  of  exponential 
queueing  systems,  presentation  of  our  approach  is  facilitated  by  tem¬ 
porarily  considering  a  larger  class  of  problems  known  as  semi-Markov 
decision  processes  (SMDP) .  We  begin  by  defining  a  SMDP  and  use  the 
M/M/c  queue  with  removable  server  to  provide  a  concrete  illustration 
of  our  many  definitions. 

A  SMDP  is  specified  by  five  objects:  a  state  space  S,  an  action 

space  A  ■  X  A  ,  a  law  of  motion  q,  a  transition  time  t,  and  a 
S  GS  s 

reward  r.  Whenever  (and  however)  the  system  is  in  state  s  and  we 

choose  action  a,  three  things  happen:  (1)  the  system  moves  to  a  new 

state  selected  according  to  the  probability  distribution  q(*|s,a), 

(2)  conditional  on  the  event  that  the  new  state  is  s',  the  length  of 

time  it  takes  the  system  to  move  to  state  s'  is  a  nonnegative  random 

variable  with  probability  distribution  t(-|s,a,s'),  and  (3)  conditional 

on  the  event  that  the  new  state  is  s'  and  the  transition  takes  t  time 

units,  we  receive  a  reward  r(x,t|  s,a,s' )  by  time  i  <  ti  typically, 

the  reward  is  composed  of  a  continuous  (as  a  function  of  t)  component 

and  a  jump  which  is  received  either  at  x  =  0  or  at  x  *  t.  After  the 

transition  to  s'  occurs,  a  new  action  a'eA  ,  is  chosen  and  the 

s 

process  continues  in  the  obvious  manner.  The  decision  epochs  are  the 

i 

times  of  transition. 

In  the  context  of  a  SMDP,  we  shall  assume  that  the  time  between 

transitions  is  an  exponential  random  variable  with  parameter  A  ; 

s ,  a 

-Xg  ax 

that  is,  t(x|s,a,s')  *  1-e  '  .  Moreover,  we  require  that  there  be 


0 


5. 


u 


a 


a  finite  upper  bound  on  the  set  of  X's  to  ensure  that  only  a  finite 

number  of  transitions  will  occur  in  a  finite  amount  of  time. 

Letting  a  _>  0  be  the  interest  rate  used  for  discounting ,  so  that 

a  reward  r  received  at  time  t  has  present  value  re-  ,  we  define 

V  (s)  to  be  the  total  expected  a-discounted  reward  that  can  be  obtained 
n,a 

during  the  last  n  transitions  when  starting  from  state  s  and  follow¬ 
ing  an  optimal  policy.  (When  it  is  clear  that  the  value  of  a  is  fixed, 

we  will  often  delete  the  a  and  simply  write  V  (s)  rather  than  V  (s).) 

n  n,a 

Setting  V  (s)  =  0,  it  is  clear  that  we  have  the  following  recursive 

0  #  Cl 


equations  for  V 


n,a 


(1) 


V  ,  (s)  *  max  (r  (s,a)  +  /  A  (a+A  ,)  lv_  (s')dq(s'  |s,a)  }  , 


n+l,a 


aeA 


s,a  s,a  n,a 


where 
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X 


«  t  -At 

r  (s,a)  =  /{/[/  e  aTdr(T,t|s,a,s ')  ]  A  e  s,a  dt)dq(s' |s,a)  . 

°  S  0  o  s,a 

Of  course,  ra(s,a)  is  the  expected  a-discounted  reward  earned  during 
one  transition  when  starting  from  state  s  and  choosing  action  a. 

If  <V  (s)>  possesses  a  limit  for  each  s  as  n  tends  to  infinity, 

n,  u 

then  we  denote  this  limit  by  V^fs).  Throughout  the  remainder  of  the  paper, 
we  will  refer  to  any  policy  it  whose  a-discounted  return  equals 

as  a-optimal,  a>0.  If  ua_va  goes  to  zero  as  a  goes  to  zero, 
then  tt  is  said  to  be  0-optimal?  furthermore,  if  for  some  a'>0  we  have 
Ua=Va  for  0<a<_a',  then  it  is  termed  strongly  optimal.  Finally,  it 
is  said  to  be  average  optimal  if  lim  inf  U  /n  ■  lim  sup  V  _/n  =  V. 

-  -  n*  "'0 


' . iVi'  lAirfniiM^rr'  mi  i 


As  a  concrete  illustration  of  the  above,  consider  the  M/M/c  queueing 
system  with  removable  servers  [16] .  Customers  arrive  according  to  a  Poisson 
process  with  rate  X.  There  are  c  <  »  independent  exponential  servers 
each  with  rate  u,  and  the  queue  has  an  unlimited  capacity.  The  cost 
structure  consists  of  three  parts:  a  holding  cost  h  per  customer  per 
unit  time,  a  running  cost  r  per  server  per  unit  time,  and  a  switching 
cost  K+(K-)  that  is  incurred  each  time  a  server  is  turned  on  (off). 

Thus,  if  there  are  x  servers  on  and  it  i3  decided  to  have  y  servers 
on,  then  the  switching  cost  is  given  by 


K(x,y)  - 


K  (y-x) ,  if  y>x 
K~(x-y) ,  if  y  ^  x 


Taking  the  state  of  the  system  to  be  (i,x)  ,  where  i  is  the  number 
of  customers  in  the  system  and  x  is  the  number  of  servers  on,  we  have 


S  *  { (i,x)  :  i  ■  0,1,2,...;  x  *  0,1,2, ... ,c},  A  =  A  *  {0,1 ,2, . . . ,c}  , 

a 


q((i+l,y)|(i,x),y)  -  H(i/yj-  and  q( (i-l,y) | (i,x) ,y)  -  -jffiyf 


Of  course,  xj  y  *  X  +  y(iAy)  so  that  the  holding  times  depend  upon 

the  state  (through  i)  and  the  action  chosen.  Finally,  r(x»t | (i,x)  ,y ,  (i ' ,y) ) 

K(x,y)  +  (hi+ry)x,  so  that  ra((i,x),y)  *  K(x,y)  +  (hi+ry)/(a+X+(iAy)  u)  • 

k 

The  most  obvious  disadvantage  of  the  problem  formulation  as  given 
above  is  that  the  expected  length  of  che  n-period  problem  is  not  a  con¬ 
stant  (for  fixed  n) ,  but  rather  a  variable  that  depends  heavily  upon 
both  the  control  policy  implemented  and  the  initial  state  of  the  system. 


iV.r,  i  WflA-. i.j-li,  J. 


(Roughly  speaking,  more  customers  in  the  system  and  more  servers  on  will 
result  in  a  shorter  time  till  the  completion  of  the  n  transition.) 
Thus,  the  n-period  problem  does  not  correspond  in  any  strong  sense  to  a 
problem  in  which  one  seeks  to  minimize  the  expected  discounted  costs  over 
some  time  horizon  of  finite  (expected)  length  T.  More  sharply,  we  see 
that  the  n-period  problem  with  the  standard  formulation  is  not,  as  one 

would  naturally  presume  it  to  be,  a  continuous  time  analog  of  the  dis¬ 
crete  time  n-period  problem  where  the  periods  are  of  constant  and  equal 
length . 

While  the  period  lengths  in  a  SMDP  must  necessarily  be  random 
variables,  the  problem  can  be  reformulated  so  that  the  exponential 
period  lengths  all  have  the  same  parameter,  independent  of  both  the 
control  policy  employed  and  the  initial  state.  Hence,  the  aesthetic- 
philosophical  need  to  more  closely  approximate  a  problem  of  fixed  length 

—  or  at  least  of  fixed  expected  length  —  leads  us  to  advocate  the 
necessity  of  a  reformulation. 

A  second  reason  prompting  us  to  reformulate  the  n-period  problem 
is  the  fact  that  the  standard  formulation  dissipates  desirable  properties 

—  such  as  monotonicity  and  concavity  —  of  the  return  function  V  (•)• 

n,a 

In  turn,  this  leads  to  "foolish  decisions  being  optimal."  For  instance, 

if  K-  >  r/X  and  p/X  >  h/r  in  the  removable  server  model,  then 

V,  ((1,1))  <  V,  ((0,1))  for  all  a  >  0  small,  so  it  is  cheaper  to 
l»a  l»a  — 

incur  the  holding  cost  simpJy  in  order  to  increase  the  transition  rate 

and  hasten  the  end  of  the  horizon.  On  the  other  hand,  it  is  shown  in 

section  3  that  V  (.)  is  strictly  increasing  in  the  number  of  customers 
a 

in  the  system,  so  that  the  "improper"  standard  formulation  has  lost  the 


8 


monotonicity  of  V  ((>,x)).  Similar  aberrations  can  occur  in  the 
n,a 

policy  itself. 

We  now  show  how  to  change  the  set  of  decision  epochs  so  as  to  obtain 
constant  x's  while  leaving  the  underlying  stochastic  process  unchanged. 
To  begin,  assume  for  simplicity  in  presentation  that  q(s|s,a)  =  0,  and 
define 


A  =  sup  X 

s,a 

s,a 


Next,  redefine  the  law  of  motion  q  as  follows: 


if  s'  *  s 


q'  (s' |s, a) 


— 7pq(s'|s,a),  if  s'  f  s  . 


This  change  in  the  law  of  notion  permits  us  to  redefine  the  X's  by 
setting  them  all  equal  to  A .  Taken  together  these  changes  in 
q  and  the  X's  leave  the  underlying  stochastic  process  unchanged  as 
the  infinitesimal  generator  is  unchanged  and  it  uniquely  determines  the 
pure  jump  Markov  process  [  4  ,  Ch.  8]. 

Having  increased  X  to  A  ,  the  expected  length  of  the  time 

S  $  cl 

until  a  transition  occurs  has  been  reduced  from  1/X  to  1/A.  of 

S  f  3 

course,  the  number  of  transitions  until  a  change  of  state  occurs  has 
changed  from  the  constant  1  to  a  geometric  random  variable  with 


parameter  X  /A.  combining  these  two  facts  3hows  (again)  that  the 

S  §  £1 

expected  time  t.'ll  a  change  of  state  occurs  is  [X  /A]  -  1/X 


as  desired. 


a 


0 


0 


0 


0 


ft 


9. 

In  the  context  of  the  M/M/c  queue  with  removable  servers,  we  have 

A  -  X+cy  and  ra((i,x),y)  ■  K(x,y)  +  (hi+ry)/(a+A)  , 

so  that  the  recursive  aquations  for  the  reformulated  problem  are 

(2)  V  .  (  i,x  )  *  min  {K(x,y)  +  [hi  +  ry  +  XV  (  i+l,y  ) 
n+1  y  a4A  n 

+  y(iAy)Vn(  i-l,y  )  +  (A-X-y(iAy))Vn(i,y)]}  . 

As  originally  conceived ,  reformulation  of  this 
model  entailed  allowing  idle  servers  to  complete  service  on  fictitious 
customers  at  the  rate  U  with  the  proviso  that  while  we  say  that  a 
transition  has  occurred  if  an  idle  server  completes  service,  the  state 
of  the  system  does  not  change.  More  generally,  one  can  imagine  a  bell 
that  is  triggered  by  an  exponential  clock  with  parameter  A  .  A  transi¬ 
tion  occurs  if  and  only  if  the  bell  rings.  Furthermore,  the  probability 
that  the  new  state  is  s'  given  that  the  system  was  in  state  s  and 
action  a  was  chosen  is  given  by  q'(s'|s,a)  and  is  determined  indepen¬ 
dently  of  the  clock  and  of  the  past  choices  of  s'. 

In  closing,  we  note  that  if  a  stationary  policy  is  employed  for  the 
infinite  horizon  problem,  then  both  the  new  and  the  standard  formulation 
are  equivalent  as  are  their  functional  equations.  Moreover,  for  every 
policy  —  Markovian  or  not  —  in  the  standard  formulation,  there  corres¬ 
ponds  a  policy  in  the  new  formulation  with  the  same  sample  paths 
(although  not  the  same  system  history)  and  the  same  return  function.  Con¬ 
sequently,  establishing  the  existence  of  a  stationary  policy  that  is  optimal 
in  the  new  formulation  yields  the  same  result  for  the  standard  formulation. 

The  efficacy  of  our  new  formulation  is  demonstrated  in  the  next 


four  sections. 
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III.  THE  M/M/c  QUCUE  WITH  REMOVABLE  SERVERS 

In  this  section  we  consider  the  M/M/c  queue  with  removable  servers 

described  In  section  2.  Using  the  Inductive  approach  on  the  n-perlod 

problem  with  constant  expected  time  between  transitions,  we  begin  by 

showing  that  the  n  period  return  function  V„  (1,x)  as  given  In  (2) 

Is  a  convex  function  of  x.  From  this,  the  optimality  of  a  control-limit 

policy  follows  readily. 

Define  J  (1,y)  by 
n,a 

Jn,a(1,y)  a  ^+A  {h  1  +  r  y  +  xVn,a(1+1,y)  +  ^iAy)Vn,a(1‘1,y)  +  ^c^lAy))v 

so  that  (v0>ot=°) 


'I . I"11 


1 


n,a 


(i.y)} 


Vl,a(1,X)  =  min  {K(X,y)  +  Jn,a(i,y^}* 

y 


Also,  define  y„  (i,x)  to  be  the  optimal  decision  (number  of  servers 
n  ,a 


on)  when  n  periods  remain,  the  discount  factor  Is  a,  and  the  current 


yn.a(1>x) 


>  Isa 

n.ct 

(1)  with 

control -limit  policy  If  there 

s  (1)  <  S„  (1)  such  that 
n,a  —  n,a 

1 

1 

j 

nJ"- 

xisn.<,(1> 

i 

*  * 

Sn.a(1)<x<Sn.a(i) 

n  (*>• 

n,a 

xiW(>- 

i  1 


If,  In  addition,  s„  (1)  <  i  for  each  1,  then  we  say  that  y_  Is 

n,a  ~  n,o 

a  regular  control- limit  policy. 
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THEOREM  1:  Given  a>0.  n,  and  1,  the  functions  V_  (1,.)  and  J_  (1.0 

1  n  |(x  n  •  o( 

are  convex  and  yn  a  Is  a  regular  control -limit  policy. 

Proof:  Defining  H_(x,y)  *  K(x,y)  +  J_  (1,y),  we  note  that  Hft(*,0 

1  ■■ 1  ■  n  iifOt 

Is  jointly  convex  as  J0a(1,O  ^near*  K(*»0  Is  jointly  convex, 

and  the  sum  of  convex  functions  is  convex.  Hence,  V,  (1,x)  *  min.  {Hn(x,y ) } 

i  »a  y  u 

Is  a  convex  function  of  x.  (Note  that  Hn( * , * )  convex  In  each  variable 
Is  not  sufficient;  joint  convexity  is  needed  to  apply  this  well  known 
theorem  on  the  minimum  of  a  convex  function.)  Now  assume  that  V„  (1,-) 

n » ct 

Is  convex  for  each  1.  Then  (1,*)  is  convex  for  each  1  as  the 

n»a 

sum  of  convex  functions  Is  convex.  Consequently,  Hn(,»')  jointly 

convex  so  that  Vn+1  a(1 , • )  Is  convex  for  each  1. 

The  convexity  of  J  (1,*)  coupled  with  the  essentially  linear 

n  fOt 

form  of  K( • , • )  yields  the  existence  of  snja(1)  <_  Sn>0(1 ) .  To  see  that 
sn+1>a(i)  I  1.  simply  note  that  Jn  ^(1 ,1 )  <  Jp  a(1 .1+j )  +  K+j/(a+A), 
so  that  if  action  1+j  were  optimal  from  state  (1,1)  we  would  have 


W“*»> ‘  +  Jn,«(,’1+j) >  V1’1’  i  W(1>1)- 


Q.E.D. 

Theorem  1  can  now  be  applied  to  yield  the  same  results  for  the  in¬ 
finite  horizon  problem  with  a>0. 


THEOREM  2:  For  o>0,  V  ( i , • )  and  J  (1,0  exist  and  are  convex,  V  (*,0 

— — —  a  a  a 

is  the  unique  solution  to  the  functional  equation  of  dynamic  programming, 
and  there  Is  a  stationary  policy  y  that  Is  a-optlmal.  Moreover, 

a 

y<*  ”  <sa^ )  *  Sa(1)>  a  regular  control -limit  policy. 


» 


» 


s 


s 


* 


* 


12. 


Proof:  To  see  that  V  =  11m  V  exists,  simply  note  that  V  +,  >  V 

oi  __  n  s  ot  ii  ^  .  i!  i< 

h-h» 

and  that 


Vn  <1  ,x)  <  V  [K+c  +  K'c  +  rc  *  j  <  b. 

n,a  j-o  a+A  a+A  1 


<  ». 


(Of  course,  {B^ }  Is  not  bounded.)  The  convexity  of  Va ( 1 , * )  follows 
from  Theorem  1  and  the  fact  that  the  limit  of  convex  functions  Is  convex, 
whereas  the  uniqueness  and  the  existence  of  an  optimal  stationary  policy 
is  immediate  from  Theorem  1  of  [11].  These  last  facts,  coupled  with  the 
convexity  of  Jn(1,*)>  suffice  to  establish  the  existence  of  <sa(1)> 
and  <S  (1 )>. 

Q.E.D. 

Establishing  these  results  for  the  average  cost  case  Is  slightly 
more  delicate  and,  In  particular,  we  will  need  to  assume  x<cy  least 
V,  the  optimal  return  function,  be  Infinite.  Before  proving  that  a  con¬ 
trol-limit  policy  Is  also  optimal  for  the  average  cost  problem,  we  need 
the  following  lemma  which  asserts  that  all  servers  must  be  on  when  many 
periods  remain,  many  customers  are  in  the  system,  and  a>0  Is  small. 

LEMMA  1:  There  are  numbers  N*  <ooJ  I<»,  and  a*>0  such  that  sn  a(1)  *  c 
* 

whenever  n  >  N  ,  1  >  I,  and  a  <_  a*. 

Proof:  To  begin,  define  vn  a(i)  =  min  {Vn  ^(i+l.x)  -  Vp  a(1,x)}.  Letting 


5  =  yn+i,a(i+1 »x)«  we  have 


turn 


vn+1,a(i+i.x)  •  v-i.«0.x)  i dr  <“ +  -  JB_„(i.O> 


so  that 


n+1  ,a 


1^+A  {h  +  Xvn J1+1>  +  +  **(C-M))v_  .(W, 


v  i ^  (1)  >  -—■  +  ~7  min  {v„  (1-1), v„  (1),v„  (1+1)}. 
n+1, a'  '  —  a+A  a+A  n,a'  '  n,a'  '  n,a'  ' 


Iterating  this  Inequality  gives 


.  nAl  *  ' J 
Vn+l,a{i+1)  -  ^+A  jf0  a+A" 


Let  V  be  the  return  associated  with  choosing  action  c  when 
n,a 

n  periods  remain  and  acting  in  ar.  optimal  fashion  thereafter.  Then  for 


1  >  c, 


W,+,-x)  -  W,+,-x>  ♦5iruiyll>(l(i«.t)  -  va<i«,c)] 


♦  -  Vn  a(i.c)]  ♦  »(e-OC*n,0 (1+1.5)  -  Vn>a(1.c)]> 

] 

^-cK+-df  -J^T  (c-C)K' +  M(c-0vn,a(1) 


>  -c[K++K"  +  j ]  +  u(c-Ovn  0(1). 


But  (5)  pennlts  us  to  choose  N  ,  oi*,  and  I  so  that  the  right  hand 

side  of  the  inequality  above  Is  strictly  positive  for  all  n^_N*,  1M, 

and  a<a*  whenever  s<c.  Hence,  we  must  have  y  (i,x)  =  s  =  c  for 

n+ 1 

all  n>N*,  1>.I  and  a<a*. 

Q.E.D. 


Note  that  Equation  5  allows  explicit  computation  of  the  numbers  N*»  I, 
and  a*. 

return 

Because  the  optimal /function  ¥  turns  out  to  be  constant,  another 
function  to  play  the  role  of  Va  Is  needed.  Toward  this  end,  define 
the  functions  h  and  J  by 

h(1,x)  -  11m+  {V  (1 ,x)  -  V  (0,0)} 
a-*0 

and 

J(i.y)  =  -J”  {h1  +  ry  +  Ah<i+1*y)  +  M(1Ay)h(1-l,y)  +  y(c-(lAy))h(1,y)}. 

THEOREM  3:  If  *<cy,  then  ¥  Is  constant  and  finite,  h  exists  and  Is 
finite,  h(i , * )  and  J(i , • )  are  convex  for  each  1,  and  there  Is  a 
stationary  regular  control-limit  policy  y  =  <(s(1),S(1)>  with  s(1)  *  c 
for  all  i  sufficiently  large  that  Is  average  optimal.  Furthermore,  h 
satisfies  the  functional  equation 

h(i,x)  =  min  {K(x,y)  +  J(1,y)}  -V/A  0 

y 

and  any  stationary  policy  that  selects  an  action  which  minimizes  the  right 
side  of  (6)  for  each  seS  is  average  optimal. 

Proof:  In  view  of  Lemma  1,  the  set  of  policies  that  Is  a-optimal  for 
ct<p *  is  finite.  Consequently,  there  Is  a  sequence  <am>  of  discount 
factors  with  am->0  and  some  control -limit  policy  y  that  is  am-optimal 
for  each  m.  Furthermore,  the  lower  control -limits  <s(1)>  satisfy 


15. 


$ 


» 


* 


6 


ft 


€ 


ft 


\ 


s ( 1 )  ■  c  for  all  1>I.  Thus,  y  satisfies  the  hypotheses  of  Corollary 
1  of  [11]  so  that  y  Is  average  optimal  and  7  is  constant  and  finite. 

The  convexity  of  h  and  J  follows  from  the  fact  that  the  limit 
of  convex  functions  Is  convex  while  the  remainder  of  the  results  follow 
from  Theorem  4  of  [11]. 

Q.E.D. 

In  a  forthcoming  paper  [9],  it  is  shown  that  there  is  a  strongly 

optimal  control-limit  policy.  The  paper  will  also  contain  a  further 

characterization  of  the  parameters  s„  (i)  and  Sn  (i). 

n  t ci  n  »a 


i 
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IV.  OPTIMAL  CUSTOMER  SELECTION  IN  AN  M/M/c  QUEUE 

One  of  the  earliest  papers  to  appear  In  the  literature  of  the  optimal 
dynamic  control  of  queueing  systems  was  Miller's  [17].  He  considered 
an  M/M/c  system  with  m  customer  classes  In  which  each  server  had  rate 
m,  customer  arrivals  had  rate  and  pk  was  the  probability  that  an 
arriving  customer  was  from  the  k**1  class.  There  was  a  reward  rk, 
r^<r2<-- *<rm  associated  with  serving  a  customer  of  class  k.  Decisions 
were  made  at  the  times  of  arrival  whence  the  customer  was  either  accepted 
Into  service  In  order  to  obtain  the  reward  rk  or  rejected  in  order  to 
keep  available  servers  free.  Pre-emption  and  backlogging  (queueing)  of 
customers  was  not  allowed,  and  maximization  of  the  expected  reward  earned 
per  unit  time  over  an  Infinite  planning  horizon  was  the  criterion  of  opti¬ 
mality.  Several  years  later,  Cramer  [7]  Improved  upon  this  model  by  intro 
ducing  a  finite  queue  capacity  and  allowing  an  infinite  number  of  customer 
classes. 

The  treatment  of  the  model  presented  here  represents  a  very  slight 
generalization  of  Cramer's  model  in  that  we  allow  the  queue  capacity  Q 
to  be  either  finite  or  Infinite.  In  addition,  we  consider  discounting 
and  finite  horizon  problems. 

To  begin,  let  the  set  X  of  customer  classes  be  a  measurable  sub¬ 
set  of  the  interval  [1 , K] ,  and  assume  that  the  reward  function  r:XVR. 

Is  a  strictly  Increasing  function  with  r^O  and  rK<®.  Denote  by  p 
the  measure  on  X;  that  Is,  if  X  were  countable,  then  px  is  the  prob¬ 
ability  that  the  next  arrival  will  be  a  customer  of  class  x.  Finally 
add  an  artificial  class  0  to  X  with  rQ=0  and  pQ=0. 


t» 


Let  the  state  of  the  system  be  (1,x),  where  x  Is  the  class  of 
the  customer  seeking  admittance  and  1  Is  the  number  of  customers  In  the 
system  not  Including  the  one  seeking  admittance!  and  denote  acceptance 
by  I  and  rejection  by  0.  Then  In  accord  with  our  previous  notation 


we  have 


S  *  { (1  ,x):1"0,l,2, •  •  •  ,c+Q,xe{  0>U X,  \ 


A(1,x)  for  or  x=0  and  A(i  x)  =  {0*1}  otherwise, 

q((i.y)|(1.x),0)  *  q((Hl.y)|(1,x).l) 

q((1-l,0)|(1,x),O)  •  q((1,0)|  (l.x).l)  =  11  W+1)<C> , 


q((1,0)|(1,x),0)  - 


and  q((1+l,0)|(1.x).l)  ■ 


M.c-U1+1)ai 

A 


We  also  ave  a  =  \  +  cm,  ra((1,x),l)  =  rx  and  ra((1,x),0)  =  0,  so 
that  the  recursive  equations  for  this  model  are  (Vq h0) 

Vl,»(1>x)  =  nax  ,rx  +  V»(1+,)i  Vn,a(i»>  1<c+(1-  Xi<0> 

where 


V(i)  ■  Sa  ''„,a(1-y)PWy)  +  (1xc)uV„ia(i-l,0)  *  (c-(lAc))pVn>a(1,0)). 


Our  formulations  Implicitly  assumes  that  the  fee  r  is  collected 

A 
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at  the  time  the  customer  enters  the  queue  rather  than  at  the  time  he  com 
mences  service.  To  handle  this  latter  formulation,  simply  set 


ra((1.x)*l) 


r_cli.-!<,+1_c)v0 

[a+Cjjj 


as  the  customer  will  not  enter  service  until  (1+1 -c)  other  customers 
complete  service  and  each  of  these  independent  service  times  has  parameter 
c  .  Of  course,  when  Qs~,  the  problem  Is  of  Interest  only  If  r^  Is 
given  by  (9).  Because  the  case  Q=°°  Is  so  different  from  the  case  Q<» 
(e.g.,  Vn  ( •  ,x)  Is  concave  If  Q<«  and  convex  if  Q=«  and  c=l),  we 
treat  the  two  cases  separately  beginning  with  Q<». 

By  defining  Rn+lja(1)  =  Vn>a(1)  “  Vn,J1+1)’  Ecluation  7  reveals 
the  rather  obvious  fact  that  it  Is  optimal  to  accept  a  customer  of  class 
x  when  n  periods  remain,  the  discount  factor  is  a>_0,  and  there  are 
already  1  customers  In  the  system  if  and  only  if  r  is  at  least  as 

A 

large  as  Rn  a(i).  That  Is,  given  n,  a,  and  i,  there  Is  a  minimal  re¬ 
ward  that  will  be  accepted.  This  Is  to  be  expected,  for  all  customers 
classes  have  the  same  service  time  distribution.  (See  Cramer  [7  ,  p . 38-57] 
and  Llppman  and  Ross  [12]  for  two  models  in  which  service  time  depends 
upon  the  customer  class.) 

To  garner  more  information  about  the  behavior  of  R„  (1),  the  mini- 

mal  acceptable  reward,  we  need  to  know  more  about  the  behavior  of  V  ( i , x ) , 

n 

the  n  period  a-dlscounted  return  function.  It  is  clear  upon  reflection 

that  V„  (i,x)  decreases  in  i  and  a  and  increases  in  n  and  x. 
n*a 


mwfyfi-  tv-".- 
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More  Important,  however,  Is  knowledge  of 

(and,  when  Q  -  ”,  »„.„(<)  s  *n>„(1)  -  Vn>0(1+1)). 

It  will  be  shown  that  vn  a(1 ,x)  increases  in  i  and 
in  n  and  decrease  in  a,  so  that  Rp  a(1)  increases  in  i  and  in  n 
and  decreases  in  a.  This  knowledge  will  then  be  applied  in  seeking  plan¬ 
ning  horizon  results. 

The  next  Theorem  states  the  intuitively  appealing  idea  that  we  become 
less  eager  to  serve  customers  as  the  system  fills  up. 

THEOREM  4:  Given  a>0,  n,  and  x,  the  functions  Vp  a(*,x)  and  Vn  a(-) 
are  concave,  so  that  Rp  a ( * )  is  a  nondecreasing  function. 

Proof:  Let  a>0  be  given.  We  claim  that  H(i)  =  (lAc)Vn(i-l ,0)  + 
(c-(i*c))Vn(i ,0)  is  concave  if  Vn(-,0)  is  concave.  To  begin,  let 
f ( j )  =  Vn(i+j,0)  and  take  i+2<c.  Then 

H(i )  -  H(i+1 )  -  [H(i+1 )  -  H(1+2)] 

■  i  f(-l)  +  (c-i)f(O)  -  2[(1+l)f(0)  +  (c-i-l)f(l)] 

+  C(1+2)f(l)  +  (c-i-2)f (2)] 

-  i{[f(-l)-f(0)]  -  [f(0)-f(l)]>  +  (c-i-2){[f(0)-f(l )] 

-  Cf(D-f{2)]}, 

and  the  concavity  of  f(*)  =  Vn(i+-,0)  implies  that  each  term  in  braces 


Is  nonpositive.  If  1+l»c,  the  right  side  Increases  by  f(0)  -  f(-l)  <.  0. 

If  1>c,  then  the  right  side  becomes  c{[f(-l )-f(0)]  -  [f(Q)-f(l)]}  £0, 
justifying  the  claim. 

Next,  we  claim  that  concavity  of  Vn(*)  Implies  that  of  Vn+1 ( •  ,x) 
for  each  x.  To  see  this,  fix  x  and  let  1  be  the  smallest  i  for 
which  Vn(1)  >  rx  +  Vn(1+1).  To  show  that  fO)  s  vn+l^*x^  "  vn+l^1+1,x^  -  °» 
we  consider  four  cases.  For  1  <1  -2  and  1>1  we  have 

A  A 

f(1)  ■  Vn(1+1)  -  Vn(t+2)  -  (Vn(1+2)  -  Vn{f+3)) 
and 

f (1 )  =  Vn(1)  -  Vn(i+1)  -  (Vn(i+1)  -  Vn(1+2)) 

respectively,  and  both  are  nonpositive  by  concavity  of  Vn(-)-  For  1=1X-1 
and  i=ix-2  we  have 

'(V'l-vW-VV'M 

and 

f<V2>  ‘  VV’>  -  W  -  tvn0x)  *  -  yix>3 

respectively,  and  both  are  nonpositive  by  the  definition  of  ix.  This 
justifies  our  claim  that  Vn+-j(*,x)  is  concave  If  Vn(-)  is  concave. 

Now  V^(i,x)  =  rx  or  0  depending  upon  whether  i<c+Q  or  i  =  c+Q, 
so  Vj(-,x)  is  concave.  Coupling  this  fact  with  our  first  claim,  we  see 
that  V-| ( • )  is  concave, for  the  sum  of  concave  functions  is  concave. 

Assume  that  Vn ( * ,x)  is  concave  for  each  x.  Then,  as  for  the  case  n=l , 

It  follows  that  Vn(-)  Is  concave.  But  now  our  second  claim  yields  the 


desired  result;  namely,  Vn+^(*,x)  is  concave  for  each  x. 

A  simple  consequence  of  the  concavity  of  Vn_.|(*)  is  the  fact  that 

K  (i+l)  >  K  O). 
n,a  '  -  n,a'  ' 

Q.E.D. 

In  addition  to  the  decision  maker's  diminishing  willingness  to  accept 
customers  as  the  queue  builds  up  (Theorem  4),  our  next  result  asserts 
that  he  becomes  more  and  more  selective  the  length  of  the  horizon  in¬ 
creases.  (This  result  also  holds  for  the  truly  continuous  time  problem 
(see  Theorem  7.3  of  [17]).) 

THEOREM  5:  For  each  a>0,  x,  and  i,  v_  (i,x)  is  a  nonde- 
r  n  *ct 

creasing  function  of  n,  so  that  Rp  a(i)  Is  a  nondecreasing  function 
of  n. 

Proof:  Setting  v_(1 )  =  R_  . (i) ,  we  desire  to  show  that 
n  n  j  oi 

vn(i )  -l  vn_l  (i )  »  1=0,1, 2, •••»c+Q, 

holdsfor  each  n>J.  Since  v^O  and  v^hO,  tne  statement  S^  is  true. 

Assume  Sn  is  true.  We  claim  that  Sn  implies  Cn+1  where  Cn+1 
is  defined  by 

vn+l  (i  *x)  i  vn(i»x)  .  1.x. 

But  Cn+i  implies  Sn+^  as  is  easily  seen  from  (8). 

Thus,  it  only  remains  to  show  that  Sn  implies  Cn+1-  By  Theorem  4, 
we  need  consider  only  six  (Instead  of  16)  cases.  Fix  x  and  define  1n 


to  be  the  smallest  1  for  which  V„(1)  >  r  +  V„(1+l). 

n  —  x  n 

Case  1:  1+l<1n+^»  1+l<1n;  by  Sn  we  have 
vn+1(1.x)  *  vn(Hl)  >  Vn_,(1+1)  ■  »n(1,x). 

Case  2:  1+lHn+1.  <<1^,. 

vn+l*1,x*  =  rx  =  rx  +  Vl<1+1>  '  Vt<1+»  ■  V'*x>  -  Vl<1+,>  i 

Case  3:  1>.1n+1,  1+l<1n 

vn+l (i ,x)  -  rx  “  “  ^n_i (1+1 )  >  vn(i,x). 

Case  4:  1<1n+1<'1nli+l 

Vl(1,x)  1  rx  ■  vn(,'x)' 

Case  5:  1>1  +j,  1  <1^1+1 

Vl(1,x)  =  vn(t)  irx  =  vn(,'x>' 

Case  6:  i>i  .,Ai  ;  by  S„  we  have 
—  n+i  n  n 

Vl(1’x>  =  vn(i)  i-Vl*1*  =  vn(1’x>- 

Q.E.D. 

Monotone  behavior  of  Rn  a  ( 1 )  as  a  function  of  i  and  n  was  ex¬ 
hibited  in  Theorems  4  and  5,  respectively.  It  is  now  shown  that  the  mini 
mal  acceptable  reward  Rn  ^ ( 1 )  is  a  nonincreasing  function  of  a.  This 
is  to  be  expected,  for  as  a  Increases  the  future  looks  less  attractive 
or  promising  whereas  the  present  is  not  responsive  to  changes  in  a. 


- - - - 
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THEOREM  6:  For  each  n.  x  and  1,  v_  (i,x)  Isa  non- 
-  n,a 

Increasing  function  of  a,  so  that  Rn  (1)  is  a  nonincreasing  function 

n*a 

of  a. 

Proof:  Writing  (a+A)Rn  a(i)  as 

*/  v  (1 .y)p(dy)  +  (lAc)yv  (1-1,0)  +  y[c-(1+l )ac)]v  (1,0), 

^  ill®  II  i  u  ri|U 

it  Is  clear  that  Rp  a(1)  Is  nonincreasing  In  a  if  for  each  x,  vn  a(1,x) 
Is  also. 

Next,  note  V,  (1,x)  and  hence  v,  (i,x)  is  constant  in  a.  Now 

I  I  jOt 

assume  that  vn  (1.x)  Is  nonincreasing  In  a,  so  that  the  argument 

above  shows  that  Rp  ^(1)  is  nonincreasing  in  a.  From  Theorem  4  we 

know  that  vn+1  a(i,x)  assumes  one  of  the  following  three  values: 

v  V<1+2>  =  Rn.«<i+’>- 

(1i>  rx  +  Va<,+,>  -  V+,)  ■  V 

(iii)  «„,„(<)- 

But  each  of  these  three  expressions  Is  nonincreasing  in  a. 

Q.E.D. 


From  the  definition  of  Rn  a(1),  it  is  obvious  that  Rp  a(i)  <_  rK> 
while  Theorem  5  states  that  Rn  a(i)  is  nondecreasing  in  n,  so  that 

R  (i)  i  1 i m  R  (i) 

a  n+»  n’a 

exists.  Now  if  rx^Ra(i)  then  rx>_Rn^(i)  for  all  n  sufficiently 
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large,  or  If  r  <R  (1)  then  r  <R  (1)  for  all  n. 

x  oi  x  n  f  oi 

Furthermore,  Theorems  4  and  5  reveal  that  Rfl(1)  Is  a  nondecreasing  func 

tlon  of  1.  Consequently,  given  a>_0  and  xex,  there  are  Integers 

1  and  N  <»  such  that 
ajt,  a  ,x 


riU'i'  "  n'..* 


.  <  R„  (1).  If  i>1  „ 
V.  n,a'  '  a,X 


whenever  n>N  . 

""  Q  t  X 


Thus,  as  long  as  N  or  more  periods  remain,  a  class  x  customer  can 

01, x 

assert,  without  complete  knowledge  of  n,  that  he  will  be  accepted  If 
i<i  v  and  rejected  If  i>i  . 

This  naturally  raises  the  question  of  whether  N  v  will  also  suffice 
for  all  other  customer  classes  In  x.  If  the  answer  is  affirmative,  then 
N  is  called  an  a-planninq  horizon,  and  we  write  N  instead  of  N 

a  ,X  c a -  ci  a,X 

to  indicate  that  works  for  all  elements  of  >T.  If  there  is  an  integer 
N*  and  an  a*>0  such  that  N*  is  an  a-planning  horizon  for  all  0<a<pi*, 
then  N*  is  called  a  strong  planning  horizon. 

While  the  existence  of  a-planning  horizons  is  immediate  if  7  is 
a  finite  set,  the  situation  is  not  so  clear  if  x  is  not  finite  nor  is 
the  existence  of  a  strong  planning  horizon  transparent  even  if  x  is 
finite.  The  next  several  results  reveal  that  a-planning  horizons  need 
not  exist  although  weak  a-planning  horizons  exist,  that  R  (i)  is  a  con- 
tinuous  function  of  a,  and  that  there  is  a  strong  planning  horizon  if 
x  is  finite. 


at„'  i'i  ii'i  i 
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EXAMPLE  1 :  An  a-plannlng  horizon  need  not  exist. 

Let  p  be  Lebesgue  measure  on  x!  [1,2],  c-1,  Q»0,  r  ax,  xa1, 

A 

u  ■  y,  and  a  rational . 

A  straightforward  Induction  argument  shows  that  Vn  a(1)  Is  rational, 
1*0,1,  a  rational,  and  n»0,l,2,***.  (Of  course  _ ( 1  ,x )  Is  not 
necessarily  rational.)  Hence,  Rn  a(0)  Is  rational  for  all  n  and  all 
rational  a.  Consequently,  It  suffices  to  show  that  Ra(0)  Is  Irrational. 

Let  01=0  and  define  g(ir)  to  be  the  long-run  expected  return  per 
unit  time  when  we  accept  only  those  customers  whose  class  Is  -n  or  greater. 
Theorem  5  of  Llppman  and  Ross  [12]  states  that  g( * )  is  unlmodal  while 
Theorem  3  of  [12]  establishes  the  optimality  of  this  class  of  policies. 

For  this  example,  we  have 

9<">  -- T  '■  <4  + 

Thus  **  »  the  optimal  value  of  *,  is  (9-/l7)/4,  an  irrational 

number.  Out  **  =  RQ(0).  This  is  seen  as  follows.  Let  1/  denote 

the  expected  n-period  return  obtained  by  accepting  only  customers  of 

class  it*  or  higher.  Then  1/  -  nAg(ir*)  is  uniformly  bounded  (see 

Veinott  [22,  p.  1293]).  Next,  Theorem  5  can  be  employed  to  show  that  if 

Rq(0)  t  7T*»  then  there  is  an  e  >  0  such  that  |Rn  q(0)-tt*|  >  e  for 

all  n  sufficiently  large,  and  from  this  it  can  then  be  shown  that 

V  _  -  nAg(n*)  goes  to 
n  ,u 

Q.E.D. 
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As  evidenced  In  the  example,  the  problem  lies  In  the  fact  that  If 

<r  >  Is  a  strictly  Increasing  sequence  with  limit  R  (1)  and  If  R„  (1) 
x_  a  n  ,a 

m 

is  strictly  Increasing,  then  {N  „  }  must  be  unbounded.  We  say  that 

a,Xm 

the  finite  Integer  Is  a  weak  q-plannlng  horizon  If  whenever  rbW^ 
and  the  system  Is  In  state  (1,x),  the  customer  Is  accepted  if  1<1  v 

a 

and  rejected  If  1>i„  +1,  where  i  :x^fe,l  ,*  •  •  ,c+Q}  Is  defined  In  (10). 

THEOREM  7;  Assume  that  x  Is  closed,  r  Is  right-continuous,  and  let 
o>_0  be  given.  Then  there  Is  a  weak  a-planning  horizon  and  1  Is  right- 
continuous. 

Proof:  Define  S,  =  {x:1  ..=1}  and  x,  =  inf  {x:xeS,},  1*0,1  ,•••  ,c+Q. 

-  1  a »X  1  i 

We  intend  to  show  that  W  =  max  N  „  works. 

a  ^  at>Xj 

Take  x,x'tS.,  x^x.^x'  with  x'<x.  If  r=R  (i),  then  r  ,<R  (i). 

it  X  a  X  a 

But  x'eS.,  so  we  must  have  r  >R  (i).  Since  x  is  closed,  x.ex. 

1  X  a  —  1  “ 

Also,  the  right-continuity  of  r  implies  that  r  >R  (i).  But  too,  for 

*1 

xeS.  we  have  r  <r-R(i+l).  Hence,  x.eS.. 

I  A  j  A  «  II 

Q.E.D. 

LEMMA  2:  For  each  i  and  n,  V„  Ji)  is  a  continuous  function  of  a, 

'  n  j  c* 

so  that  R  (i)  is  also  continuous.  Moreover,  R  (i )  is  a  continuous 

n,ot  ot 

function  of  a. 


Proof:  A  strai gilt forward  induction  argument  establishes  the  continuity 

of  Vn  (i)  and,  hence,  that  of  R  (i). 
n  jQt  n  fOt 

Pick  a'  >  0  and  suppose  R  (i)  is  not  continuous  at  a'*  Then 
—  a 

either  R  ,  ( i )> R  (i)+e  for  all  com'  or  R  ,(i)+e<R  (i)  for  all  a<a', 

a  a  a  a 


0 
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ft  where  c>0  since  Ra(1)  Is  monotone.  In  the  former  case,  this  yields 

R„  . (1 )  >  R  (1)  +  I-  >  (1)  +x  for  a>a'  and  n  large,  contradicting 

n,a  a  e.  —  n,a  l 

the  continuity  of  R„  (1)  at  a'.  The  latter  case  Is  similar. 

n,a 

u  Q.E.D. 


(ID 


The  analysis  of  the  finite  horizon  problem,  as  embodied  in  Lenina  2 
and  Theorem  8,  renders  the  infinite  horizon  problem,  with  or  without  dis¬ 
counting  practically  trivial.  In  particular,  we  note  the  following  con¬ 
sequences  of  Lemma  2  and  Theorems  4-8:  (1)  if  the  system  is  in  state  (i.x), 

then  it  is  optimal  to  accept  the  customer  if  and  only  if  r  >  R  (i),  where 

A  Ql 

a> 0  is  given.  (2)  The  function  V  (■)  is  concave  and  the  return  func- 

—  a 

tion  Va(‘»x)  is  concave  and  uniquely  satisfies  the  functional  equation 
of  dynamic  programming  for  a>0.  (3)  Moreover,  it  is  apparent  from  the 
continuity  of  ( i )  that  a  strongly  optimal  policy  will  not,  in  general, 
exist  If  x  is  Infinite;  however,  finiteness  of  x  does  ensure  (via 
Theorem  8)  the  existence  of  a  strongly  optimal  policy.  (4)  In  the  average 


The  results  of  Lemma  2  and  Theorems  4,  5,  6  are  easily  combined  to 


yield 


THEOREM  8:  For  each  xex  there  are  integers  1..  (“Unu  i  )  and  M. 

x  ct-0  a,x  * 

and  a  number  ax>0  such  that 


whenever  n>M  and  0<a<p  . 

A  A  A 

<  "nj1)  •  (>1x 


r-  1 


In  particular,  there  is  a  strong  planning  horizon  if  x  is  finite. 


in^ip  iwmm.friiiu.il  ui^jw  i.i-y^w  >*  wwwwpwwhmw' 
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cost  problem,  the  functions  h(1,x)  =  11m.  {\/  (i,x)  -  V  (0,0))  and 

0T  cx  a 

h(1)  ?  11m.  {V  (1)  -  V  (0)}  exist,  are  concave  for  each  fixed  x>  satisfy 

cr*0  a  a 

the  functional  equation 

(12)  h(1,x)  ■  max  {rx  +  h(1+l);  h(1 )>  -  V/A  , 

and  any  stationary  that  selects  an  action  maximizing  the  right  side  of 
(12)  for  each  seS  Is  average  optimal. 

In  addition,  It  Is  worth  noting  that,  In  the  Infinite  horizon,  the 
model  Is,  In  fact,  the  appropriate  truly  continuous  time  model  and  not 
merely  some  seml-^larkov  version.  That  this  Is  so  Is  a  result  of  the  fact 
that  there  are  no  actions  available  unless  an  arrival  occurs.  This  Is 
in  contrast  to  the  model  of  section  3,  for  there  one  can  turn  servers 
on  or  off  at  any  time;  but  of  course  It  can  be  shown  that  to  do  so  at 
any  time  other  than  an  arrival  or  departure  is  luboptimal  [23].  On  the 
other  hand.  It  should  be  clear  that  the  model  with  n  transitions,  as 
embodied  in  Equations  7  and  8,  is  not  equivalent  to  the  truly  continuous 
time  model  with  time  horizon  r/A.  Finally,  we  note  that  the  functional 
equations  (12)  and  (7)  with  n=+»  can  be  rewritten  so  that  the  fictitious 
events  are  eliminated. 

We  conclude  by  establishing  that  Q-optimal  policies  do  exist  even 
if  J  is  infinite  and  by  providing  a  partial  ordering  on  the  (nonempty) 
set  of  0-optimal  policies.  Utilizing  the  characterization  inherent  in 
this  partial  ordering,  we  then  show  that  the  maximally  accepting  policy 
among  the  set  of  policies  satisfying  the  functional  equation  (12)  is 
0-optimal.  In  particular,  if  X  Is  finite,  then  this  maximally  accepting 


J 


polity  Is,  In  fact,  strongly  optimal,  so  that  the  further  computation 
always  associated  with  finding  strongly  optimal  (and  0-optlmal)  policies 
need  not  be  performed  (see  Miller  and  Veinott  [18a]  for  the  usual 
algorithm). 

THEOREM  9.  The  stationary  policy  R,  defined  by 
R(  1 )  -  11m.  R  (1)  , 

a+0+  “ 

exists  and  is  0-optlmal.  Moreover,  suppose  two  stationary  policies  *  and 
are  Q-optimal  and  that  there  Is  a  state  (1  *  ,x ' )  such  that  ir(i',x)  * 

1  f  a(1 1  ,x' )  and  tt(s )  =  c(s)  for  s  f  (i',x').  Then  for  each  a  >  0, 
the  a-discounted  return  of  it  exceeds  that  of  a.  In  particular,  if 
there  is  a  strongly  optimal  stationary  policy,  then  it  can  be  characterized 
as  the  maximally  accepting  policy  among  the  class  of  stationary  0-optimal 
policies. 

Proof:  First,  reformulate  the  problem  so  that  the  state  space  is  finite 
and  the  action  space  is  uncountable.  This  is  accomplished  by  letting  the 
state  be  the  number  of  customers  already  accepted  into  the  queue  and  the 
action  space  be  the  set  of  subsets  of  I.  Here,  each  action  specifies 
the  set  of  customer  classes  that  will  be  admitted  into  the  queue.  It  is 
evident  from  Theorem  4  that  the  action  space  can  be  further  reduced  to 
those  subsets  of  the  form  (a,K]  Y  or  [a,K]  n  Y. 

The  stationary  policy  R(i)  =  lim,  R  (i)  exists  as  is  closed 

a+Q  a 

by  hypothesis  and  Ra(i)  is  a  monotone  function  of  a  by  Theorem  6. 

Using  Blackwell's  representation  for  the  return  V _ ( tt )  of  a  stationary 

P 


* 


policy  *  [3a],  we  have 


v g( *  y* +  £(m)  • 

where  6  *  A/(a+A).  The  vectors  x  and  y  are  the  unique  solutions  of 

7T  7T 

x  *  Q*r 

xtr  ir 

and 

u-V*  * r, ;  -  v  «>  ■ 0  • 

whereas  the  vector  function  e(e.ir)  Is  given  by 
e  (6  ,^ )  =  (H(e  ,tt  )-H(tt)  Jr^  . 

In  the  above  is  the  one-step  transition  matrix  associated  with  policy 
tt,  Q*  is  the  stationary  distribution,  and  r  (i)  is  the  expected 

IT  IT 

immediate  reward  when  action  ir(i)  is  chosen  while  in  state  i.  Finally, 

H(6,«)  ■  [I  -  efQ^-Q*)]'1  -  q; 
and 

H  =  [I  -  Q  +Q*]"1  -  Q*  . 

TT  L  TT  TTJ  ^7T 

It  is  evident  that  the  convergence  of  R^  to  R  implies  that  of 
Qr  to  Qr,  Qr  to  QR,  and  rR  to  rR.  Consequently,  xR  ,  xR 

a  a  a  a 

and  yR  -*■  yR.  Hence,  to  show  R  is  0-optimal,  it  suffices  to  show  that 

a 

z(  S,R)  -  e(  S.Rg)  -*-0  as  a  ->•  0  . 

Because  I-e{QRa-QRa)  and  I- b(Qr-Q^)  are  nonsingular  for  0  <  b  <  1, 
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u 


<  i 


the  convergence  of  QR  ,  Qj  and  rR  yields  e(B,R)-e(e,Rg)  +  0  as 

a  a  a 

a  ■+  0. 

To  show  that  the  a-dlscounted  return  of  ir  exceeds  that  of  a, 
observe  that,  by  hypothesis 


V  *  ,,m*  <v  (1')  -  v„(i-+i»  , 

*  «.o+  “  “ 


< 


► 


so  by  Theorem  6  we  have 

rx<  >  V1')  '  va(1'+1)*  311  a>  0  ‘ 

Q.E.D. 

Utilizing  Theorem  9,  we  can  now  present  an  efficient  algorithm  for 
computing  the  0-optlmal  policy  R  which  is  a  strongly  optimal  policy  if 
one  exists  as  is  true  whenever  J  is  finite.  First,  however,  we  need 
the  following  result  concerning  Markov-decision  processes. 

LEMMA  3.  Consider  a  Markov  decision  process  with  state  space  S,  action 
space  A,  and  bounded  reward  function.  Denote  the  e-discounted  optimal 
return  function  by  V.,  the  e-discounted  return  function  of  the  stationary 

P 

policy  by  V  ,  and  their  difference  by  e(6,«)*  Then  if  the  stationary 

P  t  7T 

policy  n  which  selects  action  tt (s )  from  state  s  is  1-optimal  and  if 
c(b,s)  is  uniformly  bounded,  tt ( s )  is  a  maximizer  of  the  function  equa¬ 
tion 

h(s)  =  max  {r(s,a)  +  /  h{s')dPc  c , (a ) }  -  V  . 
acA  S  s,s 


Proof:  We  have 
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V  (s)  »  max  {r{s,a)  +  3  /  V  (s')dP  4 (a) > 

6  acA  S  8  s,s 

<  r(s,Tts)  +  8  /  v6>ir(s')dPSjS,  (tts)  +  e(8,s) 


r(s,Tre)  +  8  /  Vfl(s’)dPe  +  c(8,s)  -  8  /( V _(s ' )-V  (s' ))dP_  ..(O  , 


s,s'x  s 


S,S'X  S' 


where  e(8,s)  =  Va(s)-VQ  (s).  Since  it  is  1  -optimal ,  e(8*s)  -*•  0  as 

P  Pi” 

6  -*•  1  for  each  s.  Furthermore,  since  /dP.  _ , (tt  )  is  finite  (=  1) 

S  s,s  s 

and  e( 8,s )  is  uniformly  bounded,  /e(e ,s 1  )dP_  _ ,  ( tt  )  -»•  0  as  g  -►  1  for 

S  s,s  s 

each  s.  Therefore,  defining  h(s)  =  lim  [Vo(s)-Vo(0)],  we  obtain 

8-1*  8  6 

h(s)  <  r(s,iTs)  +  11m.{6/(Vft(s')-Vg(0)]dPSjSl(irs)  -  (l-e)Vg(0) 


'  C  f  1  ""  ^  \  ■  Q 

s  8-1  S  0 


+  e(6,s)  +  /  e(8,s')dP  ,(tt  )} 

S  s,s  s 

=  r(s,Tr  )  +  ;  h( s '  )dP  ,  (tt  )  -  1  <  max  (r(s,a)  +  f  h(s')dP  ,  (a ) >  -  V 

Sc  5,5  5  *“  C  »  »» 


■  h(s)  . 


q.e.d. 


COROLLARY  1 .  The  stationary  policy  R  is  the  maximally  accepting  policy 
that  satisfies  the  functional  equation  (12);  that  is,  accepting  whenever 
rx  +  h ( i  +1 )  _>  h(i)  {where  h  is  any  solution  of  (12))  is  0-optimal  and 
strongly  optimal  if  there  is  a  strongly  optimal  policy. 


Proof:  By  Lemma  3,  every  0-optimal  policy  including  R  satisfies  (12), 
whereas  Theorem  9  shows  that  R  is  maximally  accepting  among  the  class 
of  0-optimal  policies. 

Let  tt  be  any  policy  satisfying  the  functional  and  suppose  that 
"(i'.x')  =  1,  yet  R(i')  >  rx , ;  then  rx,  +  h(i'+l)  =  h(i'). 


Since  any  h  that  satisfies  (12)  can  be  written  as 


h(1,x)  =  11m, [V  (1 ,x)-V  (0,0)]  +  c  , 

a+0+  a  a 

where  c  Is  some  finite  constant  Independent  of  i  and  x,  It  follows 
from  (12)  that 


rx.  3  1 1m+[V  ( 1 * )-V  (1'+1 )]  =  lim+  R  (1 ' )  3  R(l')  . 
X  a+0  a  “  a+0  “ 

This  contradicts  the  fact  that  R(i')  >  rx,. 


Q.E.D. 


THE  CASE  Q  3  « 

When  the  queue  capacity  Q  Is  Infinite,  It  is  Imperative  that  ra  be 

given  by  (9),  for  otherwise  the  optimal  decision  Is  always  accept  the  cus¬ 
tomer  requesting  admission.  Unfortunately,  utilizing  (9)  renders  this 
case  Inherently  more  complicated  than  the  case  Q<“.  For  example,  the 
function  a(*»x)  is  not  concave,  nor  Is  it  convex  unless  c=l .  Con¬ 
sequently,  we  limit  consideration  to  the  case  c=l.  And  although 
v  ( •  ,x)  and  v„  (•)  are  convex,  this  is  not  sufficient  to  yield  the 
analog  of  Theorem  4.  The  final  result  of  Theorem  4,  however,  is  rather 
easily  obtained  as  are  the  results  of  Theorem  5.  But  the  results  of  an 
appropriately  modified  analog  of  Theorem  6  do  not  appear  to  hold,  so 
investigation  of  the  sensitivity  of  the  solution  as  a  function  of  a  is, 
at  present,  not  possible. 

We  begin  by  noting  that  the  minimal  acceptable  reward  R  (i)  is 

n 

not  v„  (1)  but  rather  v„  „(i)/3*,  where  6  ^  u/(a+y).  This  can  be 

Ilf1*  il)u 
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Q 


(13) 


0 


seen  from  Inspection  of  the  recursive  equations 
1*0;  A=  x+y) 


Vn+l,a(1,x) 


max 


v 


n,a 


*1 

(1)}, 


=1 


for  1>0  and  0  for 


xexulo},  1=0,1 »••  • 


and 


(14) 


u 


(1) 


vn>a(i,y)p(dy)  +  yvnia(i-i.o)>. 


As  In  the  case  Q<“,  It  remains  true  that  the  decision  maker  becomes 

more  discriminating  both  as  the  horizon  lengthens  and  as  the  system  fills  up. 

u 

THEOREM  10;  Given  <^0,  n,  and  x,  the  functions  vp  a(i,x)/e1  and 
a(i)  are  both  nondecreasing  in  i  and  nondecreasing  in  n. 

Proof:  Since  v^i.xj/e1  ■  rx(l-3)  and  v^iJ/61  =  x(l-3)/ ryp(dy)  = 

i+1  — 

v1  ( i+1 )/ 6  ,  the  result  is  true  for  n=l.  Assume  that 

vn(1,x)  1  vn(i+l ,x)/3,  all  x,  all  i.  Then 

0 

vn(i)  =  apf  {xi  vn(i,x)p(dx)  +  y6-i vn(1-l  ,0) > 

0  {x±  vn(1+l.x)p(dx)  +  yvn(i,0)}=  j  vn(1+l). 

Using  vn ( i )  ^  vn(i+l)/e,  we  only  need  consider  four  cases  to  complete 
the  induction  argument. 


Case  1:  Accept  at  i,  i+1,  1+2. 

vn+1(i.x)  *  r/ll-B)  +  vn(1+l)  <  rxs'(l-e)  +  vn(i+2)/B  =  v^d+U/S. 


t 


G 


O 


Case  2:  Accept  at  1  and  1+1. 

vn+1(1,x)  I  ry1  -  vn+1(1+1,x)/6. 
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Case  3:  Accept  at  1. 

vn+1(1.x)  *  V1  -  vn^1+1)/0  “  vn+1(1+1)/0. 

Case  4:  Do  not  accept  at  1,  1+1,  1+2. 

vn+1(1,x)  =  1  vn(1+1)/^  *  vn+1(1+l,x)/e. 


The  proof  that  the  two  functions  do  not  decrease  as  n  Increases 
Is  nearly  the  same  as  the  proof  of  Theorem  5. 


Q.E.D. 


O 


1 
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V .  AN  M/M/1  QUEUE  WITH  VARIABLE  SERVICE  RATE 


The  model  considered  In  this  section  It  a  generalization  of  a  model 
first  considered  by  Crablll  [5,6]  and  later  modified  by  Sabetl  [20]. 
Here,  we  have  an  M/M/1  system  with  arrival  rate  a,  service  rate  y, 
and  Infinite  queue  capacity.  The  decision  variable  is  the  exponential 
service  rate  y  to  be  employed,  where  y  lies  In  some  subset  A  of 
[0,  y],  y<“.  The  cost  structure  consists  of  three  parts:  a  holding 


cost  h  per  customer  per  unit  time,  a  service  cost  c  per  unit  time 


when  the  service  rate  is  y,  and  a  reward  of  R>0  that  Is  received  when¬ 
ever  a  customer  completes  service.  Of  course,  c  is  taken  to  be  a  strictly 


increasing  function  with  Cg>0  and  c_<°°.  To  ensure  the  existence  of 

y 

an  a-optlmal  n  period  policy,  we  assume  that  c  is  left  continuous 


and  that  A  Is  a  closed  set. 

Here,  the  state  of  the  system  is  simply  the  number  of  customers  in 
the  system,  so  the  state  space  S  is  {0,1,2,***}  and  the  action  space 
is  the  set  A,  previously  defined.  Taking  A  =  A  +  y ,  the  law  of  motion 
q  is  given  by  (q(-l |0,y)  =  0  and  q(0|0,y)  =  y/A) 


q(i+l  |1,y)  =  j,  q(i|i,y)=-^  and  q(i-l  |i ,y)= 


while  ra(i,y)  =  (c^  +  hi  -  yR)/{a+A).  This  easily  leads  to  the  recursive 


equatSws 


W(1)  =  dr  ra1s  tc„  +  h1  tV«(1,u)1> 

ye« 


»ni<in>r.ii  waaiM 


iMbU 
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(16) 


Where  (Vn  '  Vn  a(0.0).  all  UeA) 

riiu  n  iu 

Vn..(1 .")  =  »V«(’*1)  *7V«(1)  •  “'V10  ■  "*• 


The  model,  as  incorporated  in  Equations  (15)  and  (16),  generalizes  Cra- 
bill's  model  in  that  A  is  not  required  to  be  a  finite  set  and  R  Is 
not  required  to  be  0.  On  the  other  hand,  Crabill  lets  h^  be  the  holding 

cost  rate  where  ht  is  an  unbounded  nondecreasing  function,  so  the  case 

h.=  h-i  is  included  by  Crabill.  We  can,  however,  relax  the  assumption 
that  hi  =  h-i  and  assume  only  that  h.  is  an  unbounded  nondecreasing 
convex  function.  This  relaxation  Is  also  possible  for  the  model  of  sec¬ 
tion  3;  and  the  proofs  in  both  sections  3  and  5  go  through  without  change. 

We  shall  consider  finite  and  infinite  horizon  problems  with 
and  without  discounting,  whereas  Crabill  restricted  his  investigation 
to  the  class  of  stationary  policies  for  the  infinite  horizon  average  ex¬ 
pected  cost  case. 

Sabetti's  model  assumed  A  finite,  h=Q,  and  a  finite  queue  capacity, 
again  with  average  cost  as  the  criterion  of  optimality.  As  the  changes 

needed  to  incorporate  this  model  are  fairly  straightforward,  we  concen¬ 

trate  solely  upon  the  model  as  posed  in  Equations  15  and  16. 

To  begin,  define  u*  (i)  to  be  an  optimal  service  rate  when  the 

n 

system  is  in  state  i,  n  periods  remain  and  the  discount  factor  is  <*>p. 
Also,  define 

V-(,)  =  V(i)  -  W'-'1- 


i~  ■ 


f 1 WNBWWI5FWTM)  ?*B.WP'«im  w  mwwmwwiiw  »^imw*w»ww  w  '»«w*w»www^^  wtHwr  wnww****. 


DnjC((1 )  a  min  Up  -  M(vnia(1)  +  R)>-  min  icp  -  “(vnfC,(1-l)  +  R)>. 

As  in  section  IV,  the  behavior  of  u*  will  be  gleaned  from  that  of  these 
three  functions. 

A  policy  possessing  the  intuitively  appealing  property  that  more 

customers  in  the  system  leads  to  a  faster  service  rate  (l.e.,  m*  a(1) 

is  nondecreasing  as  a  function  of  i)  is  termed  a  connected  or  switch-over 

policy.  Crabill's  principal  result  states  that  if  attention  is  restricted 

to  the  class  of  stationary  policies,  then  there  is  a  switch-over  policy 

that  is  average  optimal.  In  showing  that  vn  a(‘)  is  convex,  we  extend 

this  result  to  the  finite  horizon  problems  and  to  the  infinite  horizon 

discounted  problem,  without  restriction  to  the  class  of  stationary  policies. 

From  this  result,  we  proceed  by  showing  that  u*  ^ ( i )  increases  with  n 

and  decreases  with  a  so  that  there  is  a  strong  planning  horizon  and, 

consequently,  a  strongly  optimal  policy  if  A  is  finite. 

Before  presenting  our  first  result,  it  should  be  noted  that  although 

the  introduction  of  the  reward  R  introduces  the  possibility  of  Vn  a ( i ) 

being  negative,  a  straightforward  induction  argument  establishes  the  hoped 

for  fact  that  V„  (i)  is  a  nondecreasing  function  of  i;  i.e., 
n,a 

v  (i)  >.  0. 
n,a 


THEOREM  11: 


For  each  a>0  and  n,  V  (•)  is  convex,  so  that  u*  „(i) 
~  n,a  n»a 


is  a  nondecreasing  function  Qf  i;  that  is,  m*  n  is  a  switch-over  policy. 

n  9  ot 

In  addition  V„  is  convex  for  each  ueA. 

n  f 
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Proof:  First  fix  a>0  and  note  that  V^(*)  Is  linear,  and  assume  that 
V  (•)  Is  convex.  Then  setting  wj+lta(1)  a  pj  we  have 

(“+A)Vl(,+,)  -  cu1+)  •  C,U]  +  +  u  »nO+l)  -  wi+)[vn(1+l)-vn(1)]. 

Similarly, 

(a+A)y  ,(1)  <c  -  c  *  >vn(Hi)  +  u  Vn(1)  -  w1.1[*nt1>-vn(1-l)]. 

1-1  1-1 

Combining  these  two  Inequalities,  we  obtain 

Vid+n  -  vn+l(i)  -  x[vn(1+2Hn(1+1)]  +  <^"^1+1  )Cvn<1+l )"vn(i )3 

+  wi.1[vn(1)-vn(i-l)]  >_  0 

since  m1+1  ip,  i  0,  and  Vn(»)  Is  convex  by  hypothesis.  This 

completes  the  induction  argument. 

Next,  observe  that  Vn+^(*)  can  be  written  as  follows: 

v..,  .to  »*.  „(I+D  +  V  v„(1)  +  min  tV"(vn.o(1)+R)]1, 


n+1 ,ot'  y  a+A 


The  existence  of  a  y  in  x  minimizing  c, ,-y(vn  (i )+R)  follows  from 

h  n  »ot 

c  being  left  continuous  and  x  closed.  Since  V  (•)  is  convex, 

“■  n,a 

v„  (i+1 )  >  v„  (i)  so  the  desired  result  --  namely  y*+1  (1+1)  L  m*+1  (i) 

n,a  —  n,o  nT i  ,u  (I '  i 

--  follows  immediately  when  this  last  fact  is  coupled  with  c  strictly 


increasing  and  (17). 


* 


I 


Q.E.D. 


REMARK.  A  closer  Inspection  of  Equation  17  reveals  that  if  c  is  con- 

‘  y 

tinuous,  then  y*  (1+1 )  >  y*  (i)  even  if  c  is  not  a  nondecreasing 
ft  ,ot  n  ,a  y  J 

function. 
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Using  the  usual  definition  of  a  transition,  Prabhu  and  Stidham  have 
obtained  Theorem  11  by  appending  the  following  assumption:  Is  Itself 

a  convex  function  of  \i. 

THEOREM  12:  For  each  a>0  and  ieS,  v_  ..(1)  Is  a  nondecreasing  function 
1  n  t  ex 

of  n,  so  that  y*  (1)  Is  also  a  nondecreasing  function  of  n. 
n,o 

Proof:  From  (17),  It  Is  clear  that  the  desired  monotonicity  of  yj  .(1) 

* n  fCt 

Is  a  simple  consequence  of  vp+^  a(1)  >_  vn  q(1).  Since  Vq ( 1 )  =  0  and 

v^(i)  >_  0,  the  result  is  true  for  n=0.  Assume  It  true  for  n-lj  l.e., 

v  (1)  >  v  ,  (1).  Utilizing  (17),  It  can  easily  be  seen  that 
n,a  n- 1  id 

W«^n.a(,)  ,f  Vl..(1)+Dn.l..U)- 

Writing  y*  (1)  =  p (n , 1 ) ,  we  have 
n  i  oi 

D„,cc(1)  -  cti(n+1 ,1 )  -  -  n(n>t-t)t*n(t-l)+R]) 


Vl,a(i)  -c„(n+1.1)  •  -<-l.«>[Vi«>*]  -  )[»n_, (1-1  )*R]> 


and  thus 

p  vn(1)  +Dn(i)  -  [7  v^d)  +  Dn_1(1)] 

>_  [y-y(n+l  ,1  )][vn(1 ) -  vp_1 (1 )]  +  u(n,i-l)[vn(i-l)-vn_1(i-l)]  >_  0. 

Q.E.D. 

The  existence  of  the  limit  on  n  of  y*  (1)  is  immediate  from 

n,ot 

Theorem  12,  and  we  would  expect  this  limit  to  be  a-optimal  provided  that 

lim  V  (i)  exists.  Clearly  <V  ( 1 )> ~  ,  is  a  bounded  sequence  (see 
^  n»ct  flea  n- 1 


— 


aaZ, 


ma.. 


proof  of  Theorem  2  for  the  necessary  technique),  but  Is  It  monotone? 
Evidently  not  If  R»0.  Nevertheless,  the  limit  does  exist,  and  we  have 

THEOREM  13:  For  a>0,  Va< • )  exists,  Is  convex,  and  Is  the  unique  solu¬ 
tion  to  the  functional  equation  of  dynamic  programming.  Moreover,  the 
stationary  policy  u*,  defined  by 


„J(1)  .  11.  w*  (1). 


is  a-optlmal  and  Is  a  switch-over  policy. 


Proof:  If  rate  (1)  Is  employed  rather  than  y*  (1)  when  j 

■  m+jja  j»ot 

periods  remain,  jal,2,---,n,  then  an  upper  bound  on  V_  (1)  Is  obtained 

n  fOt 

and  the  first  n  periods  contribute  no  difference  in  the  cost  between 

V  _(1)  and  the  bound  for  V  (1),  and  we  obtain 
n+m  n 

vn+m,a(i)  "  VJ1)  -  .  T+T  r  !S+r)  Air+71'  (-v>R/ot) . 


Similarly,  we  obtain 


a+A  \  a+A  - j_g^a+A 


vn«i,«(1)  '  -  .'A'  C«_+  <  -• 


^  a+A 


j=0  v 


Consequently,  <Vn  a(0>”=i  Is  a  Cauchy  sequence  for  each  i  and  each 

a>0,  and  thus  V  (• )  exists. 

a 

The  remaining  facts  follow  as  In  the  proof  of  Theorem  2  except  for 
the  a-optimallty  of  y*  which  (see  Equation  17)  follows  from  y*  a(1) 
being  a  mlnimizer  of  c^-y^  ^(1  )+R)  and  <vn  a(  1  )> p=-j  being  a  non¬ 
decreasing  sequence  with  limit  va(1).  That  y*  Is  a  switch-over  policy 
is  a  consequence  of  Theorem  11. 


Q.E.D. 
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THEOREM  14;  For  each  n>J  and  1,  vn  a(1)  Is  a  strictly  decreasing 

function  of  a,  so  that  u*  (1)  Is  a  nonincreasing  function  of  o. 

n  f  ci 

Proof:  The  result  holds  for  n*l  as  a(1)  *  h/(a+A).  Assume  It  holds 
for  n.  Taking  ai<a2»  writing  a(1)  3  M(a,1)»  and  employing 
(17),  we  have,  as  In  the  proof  of  Theorem  12, 


<VA>W,(,)  ■  <vA>w2(1) 

-C7",(“r1)3[vn.»1(i)-''n,a2(')]^(»2.i-i)[vni(>i 


o-u-V^i-nuo. 


50  Vi,«,(1>  -  W2(,)  <“z+A)/(“i+A>  -  W2(1) 


Q.E.D. 


In  general,  the  existence  of  strongly  optimal  policies  necessitates 
a  finite  state  and  action  space  (see  Denardo  [8,  p.  487]).  Con¬ 
sequently,  it  came  as  no  surprise  that  in  order  to  ensure  the  existence 
of  a  strong  planning  horizon  in  the  optimal  customer  selection  model, 
it  was  necessary  to  assume  x  finite  (thereby  rendering  a  problem  with 
finite  state  and  action  space).  In  the  model  with  variable  service  rate, 
however,  the  problem  of  an  Infinite  state  space  cannot  be  assumed  away. 
Fortunately,  the  next  Lemma  establishes  that  the  model  can  essentially 
be  reduced  to  one  with  a  finite  set  of  states  —  at  least  for  n  large 
and  a  small  —  in  that  7  is  the  optimal  rate  for  all  large  states. 

LEMMA  4:  If  7  is  an  isolated  point  of  A  or  if  {(c_-c  )/ (vT-jj ) : peA~ { TT} } 
is  bounded,  then  there  are  numbers  N*<«,  i*<®  and  a*>0  such  that 


43. 


u*  (1)  ■  »  whenever  n>N*,  1>1*,  and  0<a<a*. 

fl|U 


Proof:  We  claim  that 


n*1  ,  >  J 


/  (1+1)  >  l  (~$r)  . 

n,a  ~  a+A  j.qVb+A/ 


This  Is  seen  as  follows.  First, 


where  the  last  Inequality  results  from  the  convexity  of  Vn  a(-)*  Iterating 

this  Inequality  yields  (18)  as  claimed. 

Define  1/  (1)  to  be  the  return  of  the  strategy  o„  -  <o  (•)> 

n,a  n,a  n,a,m 

with  service  rates  given  by 


°n,a,m^  ^ 


y  5  if  m=n 

,  If  m<n. 


V  (i)-l/  (1)  >  -±r  {c 

n,a  n,a  —  a+A 


~T7  {c  *  +(u-U*  (l))v_  t  (i)>. 

a+A  W* #a(  1 )  y-  n,o'  "  n-1  »ax  ' 


Let  c  =  u  -  sup  (u*  ( 1 ) : n>  1 ,  a>0,  i eS> .  If  c>0, 

and  (19)  n’a 


then  it  follows 


from  (18)/that  Vn  a(i)-l/n  a(1)  is  strictly  positive  for  some  finite  1, 


The  bound  given  in  Equation  18  suffices  to  correct  an  error  in  Equation 
31  of  reference  [11]. 


r 


and  n  and  strictly  positive  a,  a  contradiction. 


Consequently,  let  us  suppose  that  e=0.  If  v*  (1)  =  u  some  n,i 

n  »a 

and  a>0,  then  the  desired  result  follows  from  Theorems  11,  12,  and  14. 

Therefore,  assume  that  7  is  not  an  Isolated  point  of  A  and  that  there 

is  a  set  (w*  ( i ) }  with  supremum  y  and  y  not  In  this  set.  By  Theorems 

11,  12,  and  14,  there  Is  a  sequence  <y*  (1  )>  from  this  set  with 

n.a^'  n 

°“WV  in+1>1n  a"d  ,1m,t  »■  S1nce  <V  (i  )-0/(u-^,a  (1n)) 

n.a^'  n n 

is  bounded  below  by  hypothesis  and  our  bound  on  <v  ^  (i  )>  is  non- 

n- i ,  n  n 

decreasing  with  limit  +*  by  (18),  (19)  reveals  that  V  (i  )-l/  (1  ) 

n,^n  n  n n 

is  strictly  positive  for  some  finite  n. 

Q.E.D. 

Theorems  11,  12,  14  and  Lemma  4  yield  the  following  strong  planning 
horizon  theorem  where  v*  is  defined  by 

y*(i)  =  lim  u*(i )  . 
a+O  1 

THEOREM  15:  If  A  is  finite,  then  there  is  an  and  a*  0  such 

that  for  each  i, 

y*  (i)  =  p*(i),  whenever  n>_N*  and  0<u<a*. 

In  particular,  y*  is  strongly  optimal. 

In  Example  2  below,  it  is  demonstrated  that  a  strong  planning  horizon 
may  exist  even  if  A  is  infinite,  but  there  is  no  reason  even  to  believe 
that  a  strongly  optimal  policy  will  always  exist.  However,  we  make  the 
following  conjecture:  if  c  has  a  bounded  derivative,  then  the  sta¬ 
tionary  policy  y*  defined  by  u*(i)  =  lim+  u  ( 1 )  is  0-optimal. 
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EXAMPLE  2:  A  strong  planning  horizon  may  exist  even  if  A  is  infinite. 

Suppose  A  *  [0,y]  and  c  is  linear;  that  is, 

c  -  Ku,  K  >  0  . 

Then  y*  (i)=0  or  y  depending  whether  teyn_.|  a( i  )+R  or  K<vn_i  >aO  )+R- 
Let  H(1)  5  lim+  vn  a(i)+R  and  define  y*  by 

Cf+0  * 


(  0,  if  KiH(i) 

!  u ,  if  K<H(i ). 


Then  it  is  clear  from  Theorems  11,  12,  and  14  that  y*  is  strongly  optimal 

and  that  there  is  a  strong  planning  horizon.  (Notice  that  the  "bang-bang" 

form  of  y*  does  not  depend  upon  linearity  in  the  holding  cost  function.) 

Q.E.D. 

THEOREM  16:  If  A<y  and  if  either  u  is  an  isolated  point  of  A  or 

(c_-c  )/ (TT-m) ; peA  { p } }  is  a  bounded  set,  then  h,  defined  by  h(i)  = 

y  M 

linu  [V  (i)-V  (0)],  is  convex  and  satisfies  the  functional  equation 

«*0  a 

h ( i )  =  j-  (h-i+Ah(i+l ) + u h ( i )  +  min[c  -w(h(i ) - h ( i - 1  )+R)]}  -  V/A  , 

A  yeA  P 

And  any  stationary  policy  that  selects  a  rate  which  minimizes  the  right 
side  of  (20)  for  each  i  is  average  optimal.  In  particular,  y*  is 
average  optimal . 

Proof:  By  Lemma  4  and  A<y,  the  assumptions  of  Theorem  4  of  [11]  are 
satisfied  so  that  h(-)  exists,  satisfies  (20),  and  any  stationary  policy 
that  selects  a  rate  which  minimizes  the  right  side  of  (20)  is  average 
optimal,  while  convexity  follows  from  Theorem  13.  Left  continuity  of 


and  g*(1)  nonincreasing  In  a  yield 

min  (c  -u(R+h(1)-h(1-l))}=  11m.  min  {c  -n[R+V  (1)-V  (1-1 )]}, 

ueA  U  ce-0  yeA  U  a  a 

cr+O  a' 

=  cM*(i)-y*(1)[R+h(i)-h(i-l)], 
so  that  m*  is,  indeed,  average  optimal. 

Q.E.D. 

In  the  model  of  optimal  customer  selection,  we  were  able  to  charact¬ 
erize  a  strongly  optimal  policy  (if  it  exists)  as  the  maximally  accepting 
policy  in  the  class  of  0-optimal  policies.  For  the  variable  service  model 
the  characterization  asserts  that,  given  a  choice,  slower  rates  are  preferred. 

THEOREM  17:  Suppose  two  stationary  policies  tt  and  a  are  0-optimal 
and  that  ?r(i)  =  o(i)  for  i^i'  and  Tr(i')  =  y<y  =  o(i').  Then  for 
each  t'-O,  the  a-discounted  cost  of  ^  is  less  than  that  of  a. 

Proof:  By  hypothesis,  and 

1 im+  [c  -  0 ( R+v  (i ' ))]  =  lim  [c  -  P(R+v  (i ' ))], 

O'  M  M  <■* 

oi-vU 

so  that  v(i')  strictly  decreasing  in  a  yields  the  desired  result. 

Q.E.D. 

In  conclusion,  we  note  a  rather  curious  phenomenon.  From  (18)  it 
is  apparent  that  vji)  hi[A/{a+A)]VA,  so  that 


. . |||ii||mftf^nM^ar|r^||^nrT^-r[|tt|iW.a|^^^^^^^ 


ffM/HUtfltfWAiip. 
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hd)  ■  ik  Cva(i)-va;o)]  ■  ik  l  vam  >  £  u-  • 

cr*0  Of+0  j=  I  J  =  1 

Thus,  although  the  reward  function  is  linearly  bounded,  the  relative 
values  h(*)  are  quadratic. 
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VI.  AN  M/M/C  WITH  VARIABLE  ARRIVAL  RATE 

Closely  related  to  Crabill's  M/M/1  model  with  variable  service  rate 
is  Low's  [13,14]  M/M/c  model  with  variable  arrival  rate.  In  contrast  to 
Crabill'3  model,  control  of  the  system  is  effected  by  increasing  or 
decreasing  the  price  charged  for  the  facility's  service  thereby  encourag¬ 
ing  or  discouraging  the  arrival  of  customers.  This  scenario  is,  of  course, 
equivalent  to  the  decision  maker  choosing  an  arrival  rate  X  which  in 
turn  determines  the  price  p  that  a  customer  is  charged  for  admission 

A 

to  the  queue. 

He  assume  that  the  arrival  rate  lies  in  some  closed  subset  A  of 

[0,A]  ,  \  <  °°,  each  of  the  c  <  °°  exponential  servers  has  rate  y,  and 

the  queue  capacity  Q  is  allowed  to  be  either  finite  or  infinite. 

The  state  of  the  system  is  merely  the  number  of  customers  in  the 

system,  and  the  cost  structure  consists  of  two  parts:  a  holding  cost 

im  per  unit  time  that  the  system  is  in  state  i  and  a  reward  or  entrance 

fee  p  received  whenever  a  customer  enters  the  system  at  a  point  in  time 
A 

when  the  arrival  rate  is  X.  As  is  reasonable  from  economic  considera¬ 
tions,  p  is  taken  to  be  nonincreasing  whereas  h.  is  nondecreasing. 

A  X 

In  addition,  p  is  assumed  to  be  right-continuous,  with  p—  0  and 

A  A 

pQ  <  00 .  Unlike  Low,  we  must  assume  that  h^  is  a  convex  function  of  i. 
Setting  A  =  A+yc,  we  obtain  the  recursive  equations  (VQ  =  0) 

(21)  V  (i)  =  — r  min  {-Ap,  +  V  (i,A)},  i  =  0,l,...,c+Q  , 
n+l,a  a+A  ,  .  X  n,a 
AeA 

where 


V  (i ,  A)  =  h.  +  XV  (i+1)  +  y(iAC)V  (i-1)  +  ( A“A-y Uac) ) V  (i) 

n,a  i  n,a  n,a  n,a 


♦ 


(22) 
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As  before,  we  begin  by  defining  ^(i)  to  be  an  optimal  arrival 
rate  when  the  system  is  in  state  i,  n  periods  remain^and  the  discount 
factor  is  a  >  0.  Also  define 


v  (i)  -  V  <i)  -  V  (i-1)  . 
n,a  n,a  n,a 


Clearly,  v  (i)  >  0. 

n,a  — 

Low's  principal  thrust  was  the  development  of  an  efficient  algorithm 
for  the  average  cost  case  from  which  optimality  of  a  monotone  stationary 
policy  was  established.  We  now  extend  his  monotonicity  result  to  the 
finite  and  infinite  horizon  discounted  problem. 


THEOREM  18.  For  each  n  and  a  >  0,  V  (•)  is  convex,  so  that 
-  —  n ,  a 

X  (i)  is  a  nonincreasing  function  of  i. 
n,a 

Proof:  Convexity  of  V,  follows  from  that  of  h..  Assume  V  is 

- -  1  l,a  i  n,a 

convex.  Then  using  convexity  of  V n  and  the  method  of  Theorem  11,  we 

obtain,  after  some  simplification, 


(o+A) [vn+1 (i+1) ~vn+1/(i)  ]/u 


>  c[v  (i+l)-v  (i)]  -  {((i+l)Ac)v  (i+1)  -  2Uac)v  (i) 
—  n  n  n  n 


+  (  U-1)ac)v  (i-1) }  . 
n 


The  nonnegativity  of  this  last  expression  is  shown  by  considering  the 
three  cases  i+l<c,  i  =  c,  and  i-1  =  c  separately.  In  each  case, 
the  desired  result  follows  from  convexity  of  V^.  This  completes  the 
induction  argument. 


Rewriting  (21)  as 


(23)  V  .  (i)  -  -~-r{h.+AV  (i)-y(iAC)v  (i)  +  min  {A[vn  (i+l)-p.  ]  }}  , 

n+l/Oi  a+A  1  n,a  n,ct  n,ot  a 

we  see  that  right-continuity  of  p^  together  with  A  closed, 

^(i+1)  i  0,  and  p^  nonincreasing  guarantees  the  existence  of  a 

minimizing  A,  whereas  A  (i)  nonincreasing  in  i  is  obtained 

n+ 1 ,  a 

from  v  (i)  nondecreasing  in  i. 
n,a 

Q.E.D. 

REMARK.  Equation  23  reveals  that  if  p,  is  continuous,  then  A  (i)  is 
-  A  n+l,a 

nonincreasing  in  i  whether  or  not  p^  is  nonincreasing. 

Letting  A  play  the  role  of  y  in  the  proofs  of  Theorems  12  and  14 
and  letting  _A  =  inf{A:AeA}  play  the  role  of  y  in  the  proof  of  Lemma  4, 
the  proofs  of  Theorems  12,  13,  14,  15  and  Lemma  4  suffice,  with  but 
minimal  changes,  to  establish  the  obvious  analogs  of  Theorems  12,  13,  14, 

15  and  Lemma  4  for  the  variable  arrival  rate  model.  Assuming  that 
_A  <  cy  rather  than  A  <  cy  if  Q  -  the  analog  of  Theorem  16  holds. 

Finally,  the  partial  ordering  on  the  set  of  0-optimal  stationary  policies 
(cf.  Theorems  10  and  17)  relates  that  faster  arrival  rates  are  preferred. 


'yi'JTJIW* 
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